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Cn In the present paper we consider Laplace deconvolution on the basis of discrete noisy 

data observed on the interval which length may increase with a sample size. Although this 
problem arises in a variety of applications, to the best of our knowledge, it has not been 
systematically studied in statistical literature and the objective is to fill this gap. The present 
paper essentially provides the first comprehensive statistical treatment of Laplace deconvolution 
C problem. The main contribution of the paper is the explicit construction of the Laplace 

'"^ deconvolution estimator. We show that the original Laplace deconvolution problem can be 

CSJ essentially reduced to estimating regression function and its derivatives on the interval of growing 

, |p»^ length T„. Whereas the forms of the estimators essentially remain standard, the choices of the 

^*0 parameters and the minimax convergence rates, which are expressed in terms of T^/n in this 

r — 

p^ case, are affected by the asymptotic growth of the length of the interval. 

r — We derive an adaptive kernel estimator of the function of interest, and establish its 

^^ asymptotic minimaxity over a range of Sobolev classes. We illustrate the theory by examples of 

^_H construction of explicit expressions of Laplace deconvolution estimators. A limited simulation 

J>. study shows that, in addition to providing asymptotic optimality as the number of observations 

k> turns to infinity, the proposed estimator demonstrates good performance in finite sample 

^ examples. The paper is concluded by a study of application of Laplace deconvolution to analysis 
of Dynamic Contrast Enhanced Computed Tomography data. 
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1 Introduction 



Mathematical modeling of a variety of problems in population dynamics, mathematical physics, 
theory of superfluidity and many others leads to the convolution type Volterra equation of the first 
kind of the form 

(1.1) 



q{t) = [ g{t- T)fiT)dT, t > 
Jo 



where q{-) is the known or observed function, g(-) is the known kernel and /(•) is the unknown 
function to be solved for (see, e.g. Gripenberg, Louden and Steffans, 1990). Two motivating 
examples from computed tomography and fluorescence spectroscopy will be presented later in the 



paper. The problem (1.1) is also known as Laplace deconvolution problem. 



In practice, however, one typically has discrete observations of a function / on a finite interval 
which, in addition, are corrupted by noise that leads to the following discrete noisy version of 



equation (1.1): 



y{U 



g{U -T)f{T)dT + aei 



l,...,n. 



(1.2) 



where < ti < ... < in < 7n, Ci are i.i.d. A^(0, 1) and T„ may grow with n. 



Formally, by setting g{t) = f{t) = for t < 0, equation (1.1) can be viewed as a particular case 
of the Fredholm convolution equation 



hit) 



g{t - T)f{T)dT, 



whose discrete stochastic version 



y{ti) 



9{ti-T)f{T)dT + a€i, i 



l,...,n. 



(1.3) 



(1.4) 



known also as Fourier deconvolution problem, has been extensively studied in the last thirty years 
(see, for example, Carroll and Hall, 1988; Delaigle, Hall and Meister, 2008; Diggle and Hall, 1993; 
Fan, 1991; Fan and Koo, 2002; Johnstone et a/., 2004; Pensky and Vidakovic, 1999; Stefanski and 
Carrol, 1990 among others) 



However, such an approach to solving (1.1) and (1.2) is very misleading. "Much of the classical 



theory of Fredholm equations reduces to mere trivialities when applied to Volterra equations. On 
the other hand, Volterra equations exhibit a variety of phenomena unknown to Fredholm theory" 
( Gripenberg, Louden and Steffans, 1990, p. 3). First, artificial zero extension of g{t) and /(t) 
for t < evidently affects their regularity at zero. Furthermore, noisy Laplace deconvolution in 



(1.2) is very different from its Fredholm counterpart (1.4). To start with, the measurements of the 



right-hand side of (1.2) are available on the increasing interval [0,r„], which makes application of 



usual discrete Fourier transform typically used for solving (1.3) impossible since the latter would 



assume periodicity of a function on [0,T„]. Moreover, for the noisy measurements in (1.4), the 



solution by Fourier transform may not vanish for t < and, as a result, may be very different from 
a true solution / of the equation. 



If both functions / and g are positive and integrable, problem ( 1.1 ) can be interpreted as density 
deconvolution where ah three random variables are positive. However, methodologies which are 
designed for density deconvolution are not helpful either, since in the density deconvolution set 
up observations are absent on a particular part of the domain only if the corresponding density 
function is very low. Therefore, one can easily circumvent the challenge imposed by absence of 
observations for large values of t by setting the corresponding density estimator to zero when 
t > Tn. This, however, is not at all true in the regression set up where lack of observations is due 
entirely to experimental design and has no relation to the values of the estimated function. The 
consequence of this fact is that in the regression set up convergence rates indeed depend on the 



value of T„. The last resort is to treat problem (1.2) as a general ill-posed problem and apply some 



general methodologies to its solution (see, e.g., Bissfoantz et al, 2007 or Golubev, 2010). However, 
this approach ignores special features of the Volterra equation that allow, under reasonably mild 



restrictions, to derive a closed form for the solution of (1.2). 



The mathematical theory of (noiseless) convolution type Volterra equations is well developed 



(see, e.g., Gripenberg, Londen and Staffans, 1990) and the exact solution of (1.1) can be 
obtained through Laplace transform. However, direct application of Laplace transform for discrete 
measurements faces serious conceptual and numerical problems. The inverse Laplace transform is 
usually found by application of tables of inverse Laplace transforms, partial fraction decomposition 
or series expansion (see, e.g., Polyanin and Manzhirov, 1998), neither of which is applicable in 
the case of the discrete noisy version of Laplace deconvolution. Only few applied mathematicians 



took an effort to solve the problem using discrete measurements in the LHS of (1.3). Ameloot and 
Hendrickx (1983) applied Laplace deconvolution for the analysis of fluorescence curves and used 
a parametric presentation of the solution / as a sum of exponential functions with parameters 
evaluated by minimizing discrepancy with the right-hand side. In a somewhat similar manner, 
Maleknejad et al. (2007) proposed to expand the unknown solution over a wavelet basis and find 
the coefficients via the least squares algorithm. Lien et al. (2008), following Weeks (1966), studied 
numerical inversion of the Laplace transform using Laguerre functions. Finally, Lamm (1996) and 



Cinzori and Lamm (2000) used discretization of the equation (1.1) and applied various versions 
of the Tikhonov regularization technique. However, in all of the above papers, the noise in the 
measurements was either ignored or treated as deterministic. The presence of random noise in 



(1.2) makes the problem even more challenging. 



For the reasons listed above, estimation of / from discrete noisy observations y in (1.2) 
requires development of a novel approach. Unlike Fourier deconvolution that has been intensively 
studied in statistical literature (see references above), Laplace deconvolution received virtually no 
attention within statistical framework. To the best of our knowledge, the only paper which tackles 
the problem is Dey, Martin and Ruymgaart (1998) which considers a noisy version of Laplace 
deconvolution with a very specific kernel of the form g{t) = 6e~"*. The authors use the fact that. 



in this case, the sohition of the equation (1.1) satisfies a particular linear differential equation and, 
hence, can be recovered using q{t) and its derivative q'{t). For this particular kind of kernel, the 
authors derived convergence rates for the quadratic risk of the proposed estimators, as n increases, 
under the assumption that the r-th derivative of / is continuous on (0,oo). However, in Dey, 
Martin and Ruymgaart (1998) it is assumed that data is available on the whole positive half- line 
(i.e. Tn = oo) and that r is known (i.e., the estimator is not adaptive). 

The objective of the present paper is to fill in the gap and to develop comprehensive 
general statistical methodology for Laplace deconvolution problem, namely: to establish minimax 
convergence rates over various classes of functions and to derive the rate-optimal (in the minimax 



sense) estimators. In particular, one of the main features of the considered model (1.2) is that the 
data is observed on the interval of asymptotically increasing length, where T„ — )• oo as n — )• oo. 
This is indeed a reasonable assumption since, as n is growing, demands on the improvements of 
the estimation precision require to decrease the bias by sampling q{t) for larger and larger values 
of t. Dependence of T on n may not significantly affect estimation procedures but evidently leads 
to different convergence rates that are formulated in terms of T^/n. 

We propose a Laplace deconvolution estimator that is explicitly derived and is adaptively 
asymptotically minimax over a wide range of Sobolev classes. It is based, in fact, on estimating 



Q = f * 9 ^-iid its derivatives from noisy data y in (1.2). Thus, one can use the numerous existing 
techniques for nonpar ametric estimation of a function and its derivatives. In particular, we employ 
kernel estimators with the global bandwidth selected by Lepski procedure. 

The rest of the paper is organized as follows. In Section [2] we discuss two problems which 
motivate our study: analysis of DCE-CT data and estimation of the impulse response function in 
fluorescence spectroscopy. Section [3] introduces notations and assumptions used throughout the 
paper. In the main Section |4] we explicitly derive Laplace deconvolution estimator and establish its 
asymptotic adaptive minimaxity over entire range of Sobolev classes. Section [5] contains examples 
of explicit estimators of Laplace deconvolution for various types of kernels g. Section [6] presents 
results of a limited simulation study while Section [7] provides an example with functions which are 
typically encountered in real life medical applications of Laplace deconvolution. Section |8] concludes 
the paper with discussion. All the proofs are given in Appendix. 

2 Applications of Laplace deconvolution 

2.1 Dynamic Contrast Enhanced Computed Tomography data 

The study in this paper has been motivated by analysis of Dynamic Contrast Enhanced with 
Computed Tomography (DCE-CT) data. DCE-CT is widely used in medical imaging of brain 
structures or cancerous tumors (see, e.g., Cao et al, 2010; Goh et al, 2005; Goh and Padhani, 
2007; Cuenod et al, 2006; Cuenod et al, 2011; Miles, 2003; Padhani and Harvey, 2005 and Bisdas 



et al, 2007). This imaging procedure has great potential for cancer detection and characterization, 
as well as for monitoring in vivo the effects of treatments. It is also used, for example, after a 
stroke for prognostic purposes. 




time 

Tissue enhancement 



Figure 1: DCE-CT experiment and contrast agent circulation. The patient body is 
materialized by the mixed arrow. 

The DCE-CT experiments follow the diffusion of a bolus of a contrast agent injected into a vein. 
At the microscopic level, for a given voxel of interest x having unit volume, denote the number of 
arriving particles at time t by g{t), the number of particles in the voxel by at time t by Y{t) and 
the random lapse of time during which a particle sojourns in the voxel by S{t). Assuming sojourn 
times for different particles to be i.i.d. with the c.d.f. F, one obtains the following equation for the 
average number of contrast agent particles at the moment t 

EY{t)= [ g{T)dT - f g{t-T)P{Sl<T)dT= f g{t-T){l-F{T))dT, 

Jo Jo Jo 



arrived before time t 



left before time t 



where the expectation is taken under the unknown distribution of the sojourn times. In reality, 
one does not know KY(t) and has discrete noisy observations 

Y{ti)=EY{ti)+aei. 

Medical doctors are interested in a reproducible quantification of the blood flow inside the 
tissue which is characterized by /(t) = 1 — F{t) since this quantity is independent of the number 
of particles of contrast agent which arrive into the bolus described by g{t). The sequential imaging 
acquisition is illustrated by Figure [T] The contrast agent arrives with the oxygenated blood through 



the aorta (red arrow) where its concentration within unit volume voxel is first measured when it 
passes through the CT cross section (red box). This measured concentration within a unit volume 
voxel inside the aorta is called the Arterial Input Function (AIF). Subsequently, the contrast agent 
enters the arterial system and it is assumed that its concentration does not change during this 
phase. The exchange within the tissue of both oxygen and contrast agent occurs after the arterial 
phase and the concentration of contrast agent during this exchange is measured in all tissue voxels 
(grey voxel in the zoom) inside the CT cross section. Later the contrast agent return to the venous 
system with the de-oxygenated blood (blue arrow). 

To complete description of this experiment, one has to take into account that only a small 
proportion a of the AIF enters the tissue voxel and that there is a delay b between the measurement 
inside the aorta of the contrast agent concentration (first cross of the CT section) and the arrival 
inside the tissue. This leads to the following complete model: 

Y{ti)= [' aAIF{ti-T){l-F{T))dT + aei, i = l,...,n. (2.1) 

The value of delay S can be measured with the small error using the decay between the jumps after 
the injection of the contrast agent inside the aorta and the tissue. Unfortunately, evaluation of 
the proportion a is a much harder task which is realized with a larger error. In the spirit of this 



complete model (2.1) for DCE-CT experiments, the model given by (1.2) is a necessary theoretical 



step before obtaining medical answers provided by model (2.1) 



2.2 Fluorescence spectroscopy data 



Another line of applications of Laplace deconvolution (1.2) is modeling of time-resolved 
measurements in fluorescence spectroscopy, particularly, for studies of biological macromolecules 
and for cellular imaging (see, e.g., Ameloot and Hendrickx, 1983; Ameloot et al, 1984; Gafni, 
Modlin and Brand, 1975; McKinnon, Szabo and Miller, 1977; O'Connor, Ware and Andre, 1979, 
and also the monograph of Lakowicz, 2006 and references therein). 

At present, in fluorescence spectroscopy, most of the time-domain measurements are carried 
out using time-correlated single-photon counting. The measured intensity decay is represented by 
N{tk), the number of photons that were detected within the time interval (t^, t^ + Af) and appears 
as a convolution of the response function I{t) with the lamp function L{t). One can imagine the 
excitation pulse to be a series of (5-functions with different amplitudes. Each (5-function excitation 
is assumed to excite an impulse response L{tk)I{t — tk)At at time t > tk, with the amplitude at 
time tk proportional to the excitation intensity L{tk)- The measured decay N{t) is the sum of the 
impulse responses created by all the individual 5-function excitation pulses occurring until time 
t: N{t) = J2t'.=o^(^k)I(t — tk)At. As At — )• oo, the sum in the right hand side can be replaced 
by the integral, and with a change of variables t — s = x, the last equation can be written as 



N{t) = Jq L{t — x)I{x)dx. The experimental data come in the form of measurements N{tk) which 
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are contaminated by random noise and, therefore, can be modeled by equation (1.2). The objective 



is to determine the impulse response function /(x) that best matches the experimental data which 



leads to the equation of the form (1.2). 



3 Notations and assumptions 

We start from introducing several notations and assumptions used throughout the paper. The 
Laplace transform of a function /(t) is denoted by F{s), that is, F{s) = L e~^^f{t)dt, and the 
inverse Laplace transform of F calculated at the point t by C^^[F]{t). The Lfc(M+ )-norm of the 
function h is denoted by ||/i||fc and ||/i||oo is the supremum norm of h. If A; = 2 and there is no 
ambiguity, we shall omit the subscript in the notation of the norm, i.e. \\h\\ = ||/i||2- We use the 
standard notation W'^'^{M'^) for a Sobolev space of functions on [0, oo) that have s derivatives with 
finite Lp-norms and, in particular, for p = 2, H^CR^) = Ty*'^(M^). In what follows, we shall omit 
M^ in the notations of the norms and functional spaces and, unless the opposite is stated, assume 
that all functions are defined on the nonnegative part of the real line. 



Assume now the following conditions on the unknown / and the known kernel g in (1.1): 
(Al) f gH"" 
(A2) There exists an integer r > 1 such that 

U)(^. f 0, if i = 0,...,r-2. 



^ ' ' 1 S./O, if j = r-l. 



(A3) g € W'^ n H"'+'^-^ . 



In terms of Laplace transforms, equation (1.1) can be written as G{s)F{s) = Q{s). We place 
the following restriction on G: 

(A4) Let J7 be a collection of distinct zeros s^j of G. Then s* = max^^gfj Re{suj) < 0. 

It can be shown that assumption (A4) corresponds to the condition that Fourier transform 
T[g]{z) of g is bounded below by some negative power of \z\, |->^[5'](^;)| > Clzl""^ for some positive 
C and a, which is the usual assumption in Fourier deconvolution. 

Finally, we impose the following assumption on T„ and design points tj, i = 1, ...,n: 



that maxj \ti — ti-i\ < iin^^Tn- 



(A5) Let Tn be such that T„ — )■ cxd but n T^— >-Oasn— >-cx) and there exist 1 < // < oo such 



In what follows, we use the symbol C for a generic positive constant, independent of the sample 
size n, which may take different values at different places. 



4 Main results 

4.1 Lower bounds for the minimax risk 

In order to establish a benchmark for an estimator of an unknown function / from its noisy 



Laplace convolution (1.2) we derive the asymptotic minimax lower bounds for the L [0,Tn]-nsk 
of any estimator /„. It turns out that, unlike in the density deconvolution problem or Fourier 
deconvolution set-up, the rates of convergence depend on the length of the interval Tn and are 
expressed in terms of the ratio T^/n: 

Theorem 1. Let assumptions (Al)-(A5) hold. Then, there exists a constant C > independent 
of n such that 

2m 
/r2\ 2(m + r) + l 

inf sup E\\U-f\\l^T^^>c(^] , (4.1) 

where the infimum is taken over all possible estimators fn of f , and, therefore, 

2m 
' T \ 2(m + r) + l 



inf sup E\\U-f\\l^^^>C 



4.2 Solution of noiseless Volterra equation 

As we have already mentioned, unlike Fourier deconvolution, an estimator /„ of the unknown / in 



(1.2) can be obtained explicitly in the closed form. To understand the motivation for the proposed 



/„ we find first the exact solution of the noiseless Volterra equation (1.1). 



Taking derivatives of both sides of (1.1) under Assumption (A2), one obtains 



q^'\t) = f g^^\t-T)f{T)dT, j = l,...,r-l; 
Jo 



q^^\t) 



Brf{t)+ fg^'Ht-T)fiT)dT, (4.2) 

Jo 



Keeping differentiating q and using (4.2), obtain 

Q(^+i)(t) = Brnt) + g('\t)fiO)+ fg^^\T)nt-T)dT, 

Jo 

m-1 ^t 

Jr+m)j 



\t) = Brf^"'\t) + y^g^^+^Ht)f(^H0)+ g('\T)f("^\t-T)dT. 
Then, under Assumptions (Al) and (A3), one has q^^~^^> G L2 and, hence, q £ H^^^"^' . 



In addition, due to Assumptions (Al) and (A3), (4.2) implies that q^^',g^^' S Li and, therefore 



one can use then the following known facts from the theory of Volterra equations: 



1. there exists a unique solution (/> of the equation 



Jo 
cahed a resolvent of g^^> (see Theorem 3.1 of Gripenberg, Londen and Staffans, 1990); 



(4.3) 



2. there exists a unique solution of (1.1) which can be written as 

m = B;\^-\t) - B-' f q(''\t - T)<Pir)d7 

Jo 



(4.4) 



(see Theorem 3.5 of Gripenberg, Londen and Staffans, 1990). 
Therefore, in order to solve the noiseless Volterra equation (|1.1[), one only needs to determine 



a resolvent (j) in (4.3) defined entirely by the r-th derivative g^^' of the (known) kernel g. Taking 



Laplace transform of both sides of (4.3) yields 



GM(s) = Br^is) + GM(s)I>(s) 

where, due to Assumption (A2), G^'^^{s) = s^G{s) — Br- Therefore, (j){t) can be obtained as an 
inverse Laplace transform of <&, where 

s^'Gis) - Br 



'^{s) 



(4.5) 



s^G(s) 

Behavior of the resolvent function (j) is thus determined by the properties G. It turns out (see, 
e.g., Gripenberg, Londen and Steffans (1990), Chapter 7) that, under Assumptions (A2) and (A4), 
G is analytic and, hence, all its zeros are well separated. Moreover, </> can be presented as the sum 
of a polynomial of degree (r — 1) and an absolutely integrable function: 



Theorem 2. Let Assumptions (A1)-(A4) hold. Then, the resolvent <p in {4-5) is of the form 

r-l 
j=0 ^' 



(4.6) 



where (j)i G Li. Hence, by {4-4h f i'n ((l-D can he recovered as 



f{t) = B;^ \q'^^\t)-Y.^r-l-,q^'\t)- f q^-\t 



T)(/)i(T)dr 



(4.7) 



In majority of practical applications, the number of zeros of G is finite and, since G is an analytic 



function, these zeros are of finite orders. In this case, (pi in (4.7) can be obtained explicitly. 



Theorem 3. Let f and g satisfy Assumptions (A1)-(A4). Let si he distinct zeros of G{s) of orders 
ai, respectively, I = 1, ..., M. Set sq = and oq = r. Then, f is of the form 

f{t)=B;^ U^\t)-Y,bjq'-''-^-^\t)- j\{t-x)cp^;\x)dx\ , (4.8) 



where 



(j)i{x) 



aij 



EE 

1=1 j=0 

1 



aij 



x-'e 



j! 



d'^i-i-^ r 



(a/ - 1 -j)! ds°i-J-i 

M min(jf,Q;-l) 



{s-sir~^{s) 



"oj- + Z] Z] 

i=l i=0 



J \ 1—1 



(4.9) 
(4.10) 
(4.11) 



Remark. Note that in Theorems [2] and ^ Assumptions (A2) and (A3) are essential for exphcit 



construction of estimators. However, calculations in (4.6)-(4.11) can be carried out without 



assumption (A4) being valid. Assumption (A4) is only needed to ensure that 0i ^ L\. In particular, 



if the number of zeros is finite, then Re{si) < 0, / = 1, ..., M, implies 4)1 in (4.9) is a sum of products 
of polynomials and exponentials with powers having negative real parts and, hence, (pi £ Li n -L2. 



If some of zeros have positive real parts, expansions in Theorem 3 will still be valid but 



^('^) 



will 



contain terms with exponentials which have positive powers and will grow and magnify the errors 
of estimating q as t tends to infinity. 

4.3 Adaptive estimation of Laplace deconvolution 

Let us consider q^^\t) some estimators of q^^'{t), j = 0, ...,r. Theorems ^ leads to an estimator /„ 



in (1.2) of the semi-explicit form 



r-l 



fn{t) = B;^ I q(-){t) -J2ao,r^i~jQ^'Kt) 

j=0 



gW(t-r)(?:>i(r)d7 



(4.12) 



where function (pi is expressed in terms of the inverse Laplace transform of the completely known 



function $ defined in (4.5). Under slightly more restrictive conditions which are usually satisfied 



(G has a finite number of zeros). Theorem p^ leads to an explicit expression for the estimator with 



defined by formula (4.9): 



r-l 



Ut) = B;^ I qir){t) - Y, b,q(^-^-^){t) - j^ q{t - x)(p'{\x)dx. 



(4.13) 



Note that, unlike (4.12), the integral term in (4.13) involves q rather than q^'^' and hence the 



boundary effects of estimating derivatives do not propagate to interior points of the interval [0, T„]. 
Laplace deconvolution can be therefore essentially reduced to nonparametric estimation of 
Q = f * 9 ^ ^f+m ^ggg Section 4.2) and its derivatives of orders up to r from the discrete noisy 
data in the model 

uiU) = q{ti) + cr€i, i = l,...,n, 
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which is a weh-studied problem and can be performed by a series of various approaches, e.g. kernel 
estimation, splines, local polynomials, wavelets, etc. 

It is important to note however that for the problem at hand, the data is sampled on an interval 
of asymptotically increasing length that calls for necessary modifications of traditional estimators 
and affects the convergence rates of their global L^[0, T^j-risk which are now expressed in terms of 

For illustration, we consider kernel estimation with the global bandwidth selected adaptively 
by Lepski technique. To estimate the l-th derivative of q{t), I = 0, ...,r, t G [0,T„], choose a kernel 
function Ki (not to be confused with the convolution kernel g) of order {u, I) with u > r + m 
satisfying the following conditions: 

(Kl) supp (Ki) = [—1, 1], Ki is twice continuously differentiable and J Kf{t)dt < oo. 

. . r • ,. f 0, j = 0,..., 1-1,1 + 1, ...,u-l, 

Construction of such kernels is described in, e.g., Gasser, Miiller and Mammitzsch (1985). 

Define a well-known Priestley-Chao type kernel estimator of q^' with a (global) bandwidth A;: 

^\t) = 3^ E^^ (^) (t. - U-i)m. (4.14) 

Certain routine boundary corrections are required for t close to the boundaries (see Gasser and 
Miiller, 1984 for details). 

We utilize a general methodology developed by Lepski (e.g., Lepski, 1991) for data-driven 



selection of a bandwidth A; in (4.14). In particular, we apply the global bandwidth version of 
Lepski, Mammen and Spokoiny's (1997) procedure and modify it also for estimating derivatives. 
The resulting procedure for choosing A; in (4.14[) can be described as follows. For each I, 



< I < r and the corresponding kernel Ki of order (u, /), where u> r + m, consider the geometric 
grid of bandwidths A; , where 

Az = {Aj E [(n-^T^) 5iTT,r„] : X, = Tna~^ , j = 0, 1, ..., Jn}, (4.15) 

and a > 1 is an arbitrary constant. Note that cardinality of A^ does not exceed log„n since 
card (A/) = 1 + (2/ + 1)-^ log„ (n/T^) < log^n. Define 

A«,„ = max{A G A^ : \\q^^'^ - gi'^||[o,T„] < Cfn-V^T^/i-^^^+i) fo^ ^11 h G A,, (n'^T^)^ <h<X}, 

(4.16) 
where 

Cf > fi^WKif (4.17) 

and fi is defined in Assumption (A5). 
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We then estimate q^''' by 



TmX^-^M^ — ^ 1 (*i -ii-i)yi, / = o, ...,r, 



(4.18) 



y,n 



and plug (4.18) into (4.12) or (4.13). 



Note that the resulting estimators /„ are inherently adaptive to the smoothness of the underlying 



function / in (1.2) which is rarely known in practice. 



4.4 Adaptive minimaxity 

The following theorem establishes the upper bound for the L'^[0, r„]-risk of the estimator /„ defined 
in Section 14.31 over Sobolev classes: 



Theorem 4. Let fn be an estimator of f of the form (4-lS) or {4-13) where q^ (t) is given by 



(4--18). Let Assumptions (A1)-(A5) hold and kernels Ki, I = 0,...,r of orders {u,l), u> r satisfy 



the conditions (Kl) and (K2). Then, for all 1 < m < u — r and ^4 > 0, 



sup E\\fn - f\\foT„ 



O 



7^^ \ 2(m+r) + l 

n 



(4.19) 



Under the additional conditions on / and T„,, the results of Theorem |4] can be easily extended 
to the entire positive half-line: 

Corollary 1. Let conditions of Theorem [^ hold and also there exists p > 1 such that 
/q°° t^P f'^{t)dt < oo and lim„^oo Tn ^n < oo. Let fn be as in TheoremlAfor t < Tn and fn = for 
t > Tn- Then, 



feH^{A) 
for all 1 < m < u — r and A > 0. 



sup S||/„-/||2 , = 



l[0,oo) 



7^^ \ 2(m + r) + l 

n 



Note that the upper bounds established in Theorem|4]and Corollary[T]coincide with the minimax 
lower bound for the risk obtained in Theorem [T] and, thus, cannot be improved. Hence, the derived 
Laplace deconvolution estimators are asymptotically adaptively minimax over an entire range of 
Sobolev classes. 

5 Examples of explicit Laplace deconvolution estimators 

In what follows, we shall consider two examples of construction of explicit estimators of / in the 
Laplace convolution problem. 
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Example 1. Consider (1.2) with 

g(t) = {bt - sm{bt))e~°'\ a > 0. 
It is easy to see that r = 4 and -B4 = b^ in Assumption (A2), and G is of the form 

G{s) = b\s + a)-\is + af + b^)-\ (5.1) 

Hence, G{s) has no zeros and, consequently, one can use Theorem K^l for recovering and estimating 



/. By (4.5) one has 



$(s) 



4a 60^ + 52 4a3 + 2a62 a'^ + a%'^ 

— + 2 + 3 + 4 

S S^ S"^ s^ 



so that, in (4.8) and (4.13), one has aoo = — 4o, aoi = — (60^ + b'^), 092 = — (4a^ + 2ab'^), 
ao3 = — (a^ + a?b'^) and i?i>i(a;) = 0. Hence, using (4.13), f{t) can be estimated by 



Ut) = b-' q^^^i^S^ + 4«'?"'a„.3(0 + (6«' + ^')'?"a„ .(*) + (4«' + 2a6^)g'x„ ^ (t) + (a^ + «'&')$a„ „(*) 



where A^ /, / = 0, 1, ..., 4, are defined in (4.16 ). The rates of convergence of the estimator are given 



by formula (4.19) with r = 4, i.e 



sup E\\L-W 



feH"-{A) 



[o,r„ 



O 



2m 
7^2 \ 2m + 9 
n ' 

n 



Example 2. Let us study (1.2) with 



5(t) = e-'^V-i^^t^ a>0, 
i=o •^• 



(5.2) 



where fc > and r > 1 are integers and pQ = 1. In this case, assumption (A2) is satisfied with 
Br = 1 and 



G{s) = {s + a) 



-{k+r) 



Ep^( 



s + a 



\k-j 



3=0 



Therefore, 



Ms) 






k+r 



In particular, for fc = and r = 1, ^(s) has no roots, so that 60 = ~o ^ind "0(3^) = ^.nd we recover 
the result of Dey, Martin and Ruymgaart (1998): f{x) = q'{t)+aq(t). For A; = 1, po = 1 and pi = b, 
one has g{t) = e~"''^{bt + 1) and G(s) = {s + a)~'^{s + a + b), so that G{s) has one root si = —{a + b) 
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of multiplicity ai = 1. Hence, 60 = &^(a + 6), aio = -6^ and 4>i{x) = -b'^{a + b) -^e ("+^)^ in 



formula (4.8) leading to the estimator of / of the form 

m = ?A„,, W + (« - ^)%„,o(*) + b' f KS^ - x^e-^^^'^^dx. (5.3) 

"^ u 

The asymptotic minimax rate of convergence of the estimator ( |5.3| ) is as follows: 

2m \ 



sup i?ll/n-/||fo,T„]=o((^) 



For general values of k and r, the exact form of the solution (4.8) strongly depends on the roots 



of the polynomial V{s) = '^j=oPj{s + a) -^ in the denominator. Let us assume that V{s) has m 
pairs of complex conjugate roots S2/-1 = un +iui2 and S21 = uii—iui2, / = !,••• ,m, and k — 2m real 
roots s;, • • • , Sk, I = k — 2m + !,••• ,k and that all k roots are distinct. Then, V{s) = Y\i^i{s — si) 
and 1/G{s) allows a partial fraction decomposition 

By observing that Yl^j=o'^J^'' ^^ ^^^ quotient of s^~^^ and J2j=o Pj^^~'' ^ o^^ ^^^ recursively evaluate 
Oj, j = 1, • • • , r, in ( |5.4[ ) as 

z-i 
Or = 1, a^-i = - ^ Or-jPl-j, / = !,••• , r. 

j=max(0,i— fc) 



The values of /3; can be obtained by multiplying both sides of equation (5.4) by V{s)/{s — si) and 
setting s = si: 



f3i = {si + a)''+^llis-s,)-\ l = l,--- ,k. 



The respective expression for / is of the form / = /i + /2 where 

r— 1 r 



hit) = gW(t) + ^g(0(i)J^('^')a^-'a,, (5.5) 

k + 

/2(t) = iZPi I e''"9(i-x)rf^, (5.6) 

1=1 J^ 



which can easily be reduced to representation (4.8). Note that s/ and Pi, I = 2m + !,••• , /c, have 



real values while S21-1 and S21 as well as f32i-i and {321, I = 1, • • • ,m-, are complex conjugates of 
each other. Hence, one can rewrite expression for /2 as 

m ..t k „i 

f2{x) = ^ e"'i''[7n cos(u«2a;) + 7/2 sm{ui2x)]q{t - x)dx +^1^1 e^'^'qit - x)dx 

1=1 ''^ l=2m+l •'^ 

where 7/1 = 2Re(/320 and 7^2 = 2Im(/32/)- 

Under assumptions (Al) and (A4), the asymptotic minimax rate of convergence of the estimator 
(5.3) is provided by Theorem El 
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6 Simulation study 

In this section we present results of a simulation study to illustrate finite sample performance 
of the Laplace deconvolution procedure developed above. The data were simulated according to 



the model (1.2) with the convolution kernels gi{t) = e~^*(sin(2i) — 2t) and g2{t) = e~^*(l — 2t) 
corresponding, respectively, to Example 1 and Example 2 with a = 5 and b = —2 and r = 1. 
Two (unknown) functions to estimate were fi{t) = t^e~* and f2{t) = 1 — ^2,2{t) where Tk,e is the 
c.d.f of the Gamma distribution with shape parameter k and the scale parameter 9. The Laplace 



convolution q = f * g which drives the regression in (1.2) have been numerically computed using 
trapezoidal rule for approximation of the integrals. Following construction in Gasser, Miiller and 
Mammitzsch (1985), we built kernels Ki of orders {u,l) for estimating the derivatives g*-" of g, 
I = 0,- ■ ■ ,r for various values of u. In our simulations we used u = 8 as an upper bound of the 
regularity of the kernel since higher values of u lead to numerically unstable computations and 
provide very little advantage in terms of precision. Finally, we used boundary kernels in order to 
stabilized the computations as suggested in Gasser, Miiller and Mammitzsch (1985). 

Figure [2] and [3] provides examples of deconvolution estimators based on single samples of n = 250 
equally spaced points on [0, T] with T = 10 using /i, /2 as unknown functions / and gi, g2 as known 
kernels g. Figure |3] shows that deconvolution estimators show good precision although boundary 
effects remain despite the use of boundary kernels. 

7 A more realistic example: on the road to medical applications 



In this section we provide a preliminary study of the DCE-CT model given by equation (2.1). 
Although at the moment, we do not have observations on the arterial input function (AIF), we 
have a pretty good idea of its general shape. Usually, arterial input function AIF{t) vanishes at 
t = together with several of its derivatives, then it increases for t < xq for some xq > Q and 
subsequently declines exponentially for t > xq- For all those reasons, we model AIF{t) using 



representation (5.2) of Example 2. Solution of equation (2.1) is of the form f{t) = 1 — F{t) where 
F is the distribution function. Practitioners expect that f{t) can be modeled as a weighted sum of 
exponents with weights summing to unity. 

For our simulated example we choose 5 = 0, a = 1 and AIFit) = 0.5e~*t^(t + 1) which 



corresponds to a = 1, r = 3, /jq = 1 and />i = 3 in (5.2). Explicit expression for the estimator in 



this case can be obtained from formula (5.5) 



J u 
Note that the problem is strongly ill-posed in this case and involves estimation of the third derivative 
of q. 
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Figure 2: Laplace convolution q of the known kernel g and the unknown function / (red dotted 
line), noisy observations of q (green pluses) and estimated value of q (plain black line). Upper row: 
/i, lower row f^- Left column: g = gi, cr = 0.005 (r = 4), right column: g = g2, c = 0.05 (r = 1). 

As a test function we chose F{t) to be a mixture of exponential distributions, in particular, 

1 - F{t) = 7re-*/2 + (1 _ vr)e-2*. (7.1) 



Figure El provides an example of a deconvolution estimator when / is given by formula (7.1) with 
vr = 0.2. Reconstruction is based on a single sample of n = 500 points on [0,T] with T = 15. The 
noise level a = 0.05. It is easy to see that the errors in the reconstruction of / come from the 
difficulty in estimating the third derivative q oi q mostly at the boundaries. 



8 Discussion 

In the present paper we consider Laplace deconvolution on the basis of discrete noisy data observed 
on the interval which length T„, may increase with a sample size n. Despite the fact that this problem 
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Figure 3: The true unknown / (red dotted line) and its estimate (plain black line). Upper row: /i, 
lower row /2. Left column: g = gi, cr = 0.005 (r = 4), right column: g = g2, cr = 0.05 (r = 1). 

arises in a variety of applications, to the best of our knowledge, it has not been systematically 
studied in statistical literature. The objective of this paper is to fill this gap. 

Our main contribution is the explicit construction of the adaptively rate-optimal Laplace 
deconvolution estimator. We show that the original Laplace deconvolution problem can be 
essentially reduced to estimating regression function and its derivatives. Although the latter 
problem has been well studied on a finite interval, the asymptotic increase of its length as the 
sample size grows raises a new challenge. Whereas the forms of the estimators essentially remain 
the same, the optimal choices of the tuning parameters and the minimax convergence rates, which 
are expressed now in terms of T^/n, are affected by asymptotic growth of the length of the interval. 

We utilized kernel estimation with a global bandwidth adaptively chosen by the Lepski 
procedure (e.g., Lepski, 1991) and established asymptotic minimaxity of the resulting Laplace 
deconvolution estimator over an entire range of Sobolev classes. One can apply however other 
types of estimators (e.g., local polynomial regression, splines or wavelets). In particular, we believe 
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Figure 4: Deconvolution estimation example for AIF[t) = 0.5e~*t^(t + 1) and a = 1. Top left: 
the true third derivative q^^' of q (dotted line) and its estimator (plain black line). Top right: the 
true first derivative q of q (dotted line) and its estimator (plain black line). Bottom left: the true 
q (plain black line) and observations (crosses). Bottom right: the true unknown / (black dotted 
line) and its estimate (plain black line). 

that the use of wavelet-based methods can extend the adaptive minimaxity range from Sobolev to 
more general Besov classes. 

We illustrate the theory by examples of construction of explicit expressions of estimators of / 



based on observations governed by equation (1.2) with various kernels. A limited simulation study 



shows, that in addition to providing asymptotic optimality, the proposed Laplace deconvolution 
estimator demonstrates good performance in finite sample examples as well. Finally, we provide a 



preliminary study of the DCE-CT model given by equation (2.1). 



The present paper essentially provides the first comprehensive statistical treatment of Laplace 
deconvolution problem though a number of open questions remain beyond its scope. In particular. 
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an interesting challenge would be to study Laplace deconvolution with unstable resolvents, where 



Assumption (A4) does not hold. Another important problem would be to study the equation (1.2) 
when the kernel g is not completely known and is estimated from observations. In this case, the 
value of r is not very easy to estimate, especially, when the number of observations in the vicinity 
of the origin is small, and, thus, methodology developed above cannot be automatically extended 
to this case. 
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9 Appendix 

Throughout the proofs we use C to denote a generic positive constant, not necessarily the same 
each time it is used, even within a single equation. 

Proof of Theorem [l] 

Although the rates are derived by standard methods described in, e.g., Tsybakov (2009), the 
challenging part of the proof is construction of the set of test functions and, subsequently, producing 
upper bounds for the Kullback-Leibler divergence. 

The main idea of the proof is to find a subset of functions J- C H"^{A) such that for any pair 
/i, /2 G-^, 

"■ 'l[0,' 
and the Kullback-Leibler divergence 



ll/i - Allforj > 4C(T,?>i-i)*»/<=<"'+'l+" (9,1) 



»C(F/.,F/.)= ll'"-J'll-- <!^^£^. (9,2) 

where log stands for natural logarithm and q-j = {g * fj){ti), i = 1, ...,n, j = 1,2. The result will 
then follow immediately from Lemma A.l of Bunea, Tsybakov and Wegkamp (2007). 

Without loss of generality, let us assume that the points are equally spaced, i.e. tj — tj_i = Tn/n, 
i = 1, ...,n. To construct such a subset J^, define integers M„ > 8 and N = -^ , the largest integer 
which does not exceed n/Mn- Let An = NTn/n and define points zi = I Xn, / = 0, 1, ...,Mn- Note 



that the latter implies that points of observation tj = jT^/n in equation (1.2) are related to zi as 



zi = tj where j = Nl for / = 1, ..., Mn and j < NMn- Note also that j^ < A^ < |f 
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Let /c(-) be an infinitely differentiable function witli supp{k) = [0, 1] and sucli that 

x-^k{x)dx = 0,j = 0, ...,r — 1, / x^k{x)dx^O. 
Jo 



(9.3) 



Introduce functions 






l = l,...,Mn, 



where the constant L > wiU be defined later. Note that ipj have non-overlapping supports, where 

SUpp{(pj) = [Zj~l,Zj]. 

Consider the set of all binary sequences of the length Mn > 8: 

n = {u = (l^i, ...,c^mJ, ^j = {0, 1}} = {0, 1}^^" 
and the corresponding subset of functions 



M„ 



J' = {U- fUt) = ^wj^j{t), ^ e n}. 



(9.4) 



Here fi C il is such that log2card(r2) > M„/8 and the Hamming distance p(ui,u:2) = 
HftiH^ij / ^2j} > Mn/8 for any pair wi,cj2 e ^ (see, e.g., Lemma 2.9 of Tsybakov (2009) for 
construction of $7). 



We now need to show that T in (9.4) is exactly the required set. Note first that since the 



supports of (fj are non-overlapping, for any fui ^ F & straightforward calculus yields 



Af„ 



X2s+1 



||/.||fo,T„] < E 11^^-11' = L'^M^Wkf = L^X 



2\2m||;j|2 ^ r2||,j|2 



i=i 



Similarly, 



M„ 



L2 



-I J-n 



and therefore fu; € H^"^'{A), where A = L||/c||//m. Furthermore 

WfuJl — fuJ2 I I [0,T„] — ^ rp 



\2m+l \2m+l /i^ 



and (9.1) holds provided An > C{T^n ^) i/(2(m+^)+i) fQj^ some positive constant C. 



To verify (9.2), note that 

n 2 



(9.5) 



i=l 



i=i 



where, suppressing index j, we write 



«(/) 



E 



g{ti - x)f{x)dx 



L^Xl^ 



E 



M„ 



«=1 



E^P / g{U-x)k 

'O V ^n 



X - Z;_i 



dx 



In order to obtain an upper bound for Q{f) we need the following supplementary lemma, the 
proof of which is presented at the end of the section. 
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Lemma 1. Introduce functions Kj(x) using the following recursive relation 



Ki{x)= k{t)dt, Kj{x)= Kj^i{t)dt, j = 2,...,r. 
Jo Jo 



(9.6) 



Then, under condition (9.3), functions Kj{x), j = l,...,r, are uniformly bounded and Kj(l) = 0, 
j = 1,- ■ ■ ,r. Moreover, 



g{ti — x)k { ^ \ dx = \l 



An 



BrKr ( ^' /' ' ) I(zi-1 <yi< Zi) 



+ 



ram{zi,ti) 



min(z;_i,ti) 



A„ 



g('')(t,-x)K, ( ^ /'"^ \dx 



An 



Applying equation (9.7) to the integral in Q{f), obtain 
where 

n r M,. 

ti - zi^i 



(9.7) 



(9. 



Ai = E 



A, 



j=i 






Y^BrKr 

.1=1 

^'^n /■niin{zi,ti) 

1=1 -'min{2i_i,ti) 



An 



Kzi^i <yi< zi) 



g^'\ti - x)Kr { "^ /'"' \dx 



Observe that for any t and any /i and I2 such that ^1 7^ ^2, one has Kr{\n^{t — zi.^))Kj.{Xn^(t — 
zi^)) = 0. Also, for each i, Kr{\n'^{ti — zi)) / for only one value of /, namely, for / = [i/N\ + 1 
where [x] is the largest integer which does not exceed x. Therefore, 



Ai < Bl Y^ Kl 



i=l 



ti - Zu 



An 



< n BlWKrWl, 



(9.9) 



where || • ||oo is the supremum norm. In order to obtain an upper bound for A2, observe that for 
any nonnegative function F{x) one has 

min{z;,ii) r-zi 

F{x)dx < / F{x)dx. 



Hence, we derive 



n 


.1 = 1 -^^i- 


-1 


n 


i^rllL 


-M 

1 



g^'\ti-x)Kr 



X - Zl^i 
An 



dx 



Y, \g^'-\t,-x)\dx 
.1=1 -^^'-1 



<n\\Q^''^\\HKr 



(9.10) 



Combining formulae (9.5)-(9.10), we obtain that, in order to satisfy the condition (9.2), we need 
the following inequality to hold 



nrS \2m+2r 



K(P/„P/,)< , 



n 



a^Tn 



Kr\\l[Bl + h^^m]<^.''^- (9.11) 
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Note that ^ (l - ^) < A„ < |i;^. Choosing M„ = Cnl/(2(-+r)+l)^i2(™+r)-l)/(2(„^+r)+l) ^^^ 
observing that r„/M„ ^ as n ^ oo, obtain An > r„/(2M„) > C7(r2n-i)i/(2(™+0+i). Therefore, 



both conditions (9.1) and (9.2) hold and theorem is proved. 
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Proof of Theorem [2] 

To prove Theorem [2] we utihze the fohowing Lemma [2] which is essentiahy Theorem 7.2.4 of 
Gripenberg, Londen and Staffans (1990, Chapter 7) adapted to our notations. 



Lemma 2. Let Sg be such that 



inf |G(s)|>0 and hm |s''G(s)| > 0. 

Re{s)=Sg |3|— »00 

Re{s)>Sg 



(9.12) 



Then, solution (/>(•) of equation (4-S) can be presented as 

L ai-l 



m = EEl''''''+Mt) 



1=0 j=0 



J! 



(9.13) 



where L is the total number of distinct zeros si of s'^G{s) such that Re{si) > Re{sg), ai is the order 
of zero si and 0i G Li . 



Choose Sg such that s* < Sg < 0. Then, the first condition in (9.12) immediately follows from 
Assumption (A4). To validate the second assumption in (9.12), note that for s = si + is2 
conditions Re{s) > Sg and |s| — )• oo imply that either si — )• cxd or |s2| — ^ co, or both. Recall 
that s^'G(s) = Br + G(^)(s). If si — )• oo, no matter whether S2 is finite or S2 — ?• oo, one has 



lim \s''G{s)\= lim |B,. + / g^''\t)e-'^dt\ = \Br\ > G. 



(9.14) 



If si is finite, si > Sg, and |s2| — )■ oo, then Laplace transform G('')(s) = J^ g^^'{t)e ^^dt 
is equal to Fourier transform J^[g^^' {t)e^^'^'']{s2) of function g^^''{t)e~^^^ at the point S2- Since 
5W(t)e-*i* G Li(M+), one obtains 

lim / g^'''\t)e-''dt= lim T[g^'-\t)e-''\s2) = 0, 
|s2|-S'OoJq \s2\^oo 



and (9.14) holds again. Hence, the second assumption in (9.12) is valid, and Lemma [2] can be 
applied. 

Note that, under Assumption (A4), G{s) has no zeros with Re{s) > Sg and, therefore, s^G{s) 
has a single zero of r-th order at s = 0. Lemma ^yields then that 0(t) = (/>o(t) + 4>i{t), where 

r-l 

Mt) = Y.^*'^ «,=</'^^'^(0), (9.15) 

j=o ^■ 
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and integrating by parts, one has 



r-l 



/ q^^\t - T)Mr)dT = Y, 4~'''\0)q^'Ht), 
Jo 



(9.16) 



i=o 



that completes the proof. 

D 

Proof of Theorem |3] 



From (4.5), ^{s) has poles si, I = 0,- ■ ■ , M, of respective orders ai, where sq = and ao = r. Note 
that 

lim \s''G{s)\ > 

s| — f oc 

Re{s)>Sg 



(see the proof of Theoreml2|) and, therefore, $ does not have a pole at infinity. Then, <I> is a rational 
function and, consequently, can be represented using Cauchy integral formula 

M 



Hs) 



Tri f -^ 



27ri 



1=0 



Hz) 



dz 



CiZ-s 



where Q, / = 0, ..., M, is a circle around the pole si such that this circle does not enclose any other 
pole of $ (see LePage, 1961, Section 5.14). Using Laurent expansion of ^{z) around s/, we have 

ai-l 



Ii{s) 



1 



Hz) 



27ri ,Iq z — s 



dz 



E 

j=0 



1 



d°'i-o-^ 



{s-siy+^ {ai-l-j)\ ds^i-i-^ 



[s-si)^^Hs) 



S = Sl 



Combining the last two expressions and taking inverse Laplace transform of ^{s) yields 

M ai-1 



Ht) = Y.Y.^*''^''' = Mt) + Mt), 



1=0 j=0 



J'- 



where 4>o is given by (9.15), same as before, and 

M ai-1 



1=1 j=0 



J! 



(9.17) 



Repeat calculations in (9.16) and also note that, by similar considerations, for every j = 0, ...,«; — 1, 
one can write 



t r-l ,fc 

^-^ dx" 

k=0 



+ 



x=0 






To complete the proof, evaluate the derivatives, observe that 



d^ 
dx^ 



[x^e''^] 



x=0 

and interchange summation with respect to j and k. 

a 



-k-j 
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Proof of Theorem S] 



Since the estimator (4.13) is just a particular form of the estimator (4.12), it is sufficient to carry 



out the proof for the estimator (4.12) of /. From (4.7) one immediately has 



r + 2 



r-l 



mn-f\\l,T„] < ^ I ElWtl - '^^^%^^^^ + E«0.-l-.^llC - <i^%,T„^ + llC *^l-r'* 0l|iro,T„] 



,(r) 



j=0 



(9.18) 
The proof is based on the foll owing proposition which provides upper bounds for the risks 



E\\q^i^ -q^^%T„], i = 0,...,rin(|9.18D: 



Proposition 1. Let Assumptions (A1)-(A5) hold and kernel Kj of order (r + tuq — l,r), with 
1 < Jn-o < u — r + 1, satisfies Assumptions (Kl) and (K2). Then, 



sup E\\qf -g0-)||2 O 

for all j = 0, ■■■,r, where 1 < m < mo and A' > 0. 



J^^ \ 2m + 2j + l 



n 



(9.19) 



In particular, Proposition 111 implies that the errors of estimating g"' in (9.18) are dominated 
by the estimation error of the highest order derivative q^"^'. Furthermore, 4>i G Li (see Theorem^ 
and, therefore, 



Sr) 



*^i-q^'^ *<Pih<\\<Pi\\i-\\qn^ -q^''\ 



o 



(r) (r)\ 

i\ -q^ '\ 



(see also Theorem 2.2 of Gripenberg, Londen and Staffans, 1990). 



Thus, (9.18) yields 



E\\fn-f\\fo,T„]=0[E 



Jr) 



7^11? ■ 

J 1 1 [o,r„ 



O 



7^^ \ 2m + 2r + l 



n 



a 



Proof of Proposition [T] 

For simplicity of notations we drop the index n in Xj^n- 

By the standard asymptotic calculus for kernel estimation (see, e.g., Gasser and Miiller, 1984) 



for estimator (4.14) and any interior point t of (0, T„), one has 

2 " 



Var[q'f\t) 



^2 I ^i ^ 



j=l 



a' Tr, 



A^J+i n 



K'^Au)du{l + o{l)). 
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The required boundary corrections ensure the same order of error for the values t close to the 
boundaries (Gasser and Miiller, 1984) and the integrated variance then is 



VjiXj) 



^" Var {q^ht)] dt = Vo, -^ (1 + o(l)), 



A/ n 



where Vqj = a'^\\Kj\\'^ . Similarly, the integrated squared bias can be written as 

B]{\,,q) = £" (^E (^{t)^ - g(^)(t)) dt = i?o,Af(l + 0(1)), 
where Boj = 2\\q'^"''+^'>\\^\\Kj\\^ {{m + j)])'^ {m + j)-^. Hence, 



(9.20) 



(9.21) 



rp-2 



sup E\\q^^^ - q(^%^^^^ = _sup {V,{X,) + B]{X„q)) = O [ -^^ ) + 0(Af ). (9.22) 



q£H'^+^{A) 



gg/fm + r(^) 



A/ n 



It follows from (9.20) and (9.21) that the asymptotically optimal bandwidth that minimizes 



piu^') «(i)||2 is 



A*=0 



T^\ 2{m+j) + l 



n 



(9.23) 



and the corresponding risk of estimating q^^' is given by 



sup ^\\qx* Q llo,T„ 

qGH^+^iA') ^ 



o 



J"^\ 2{m+j) + l 



n 



(9.24) 



Now we need to prove that (9.24) remains valid when A^ is replaced by Xj selected by Lepski 



procedure, that is. 



sup 



M|J/)_„(i)||2=0 



for all J = 0, ..., r, where r < m < rriQ and A' > 0. 



7^^ \ 2m+2j + l 

n 



Denote dj = {Cj - n\\Kj\\)/{V2\\Kj\\) and set A* in (9.23) to be 



A*=K 



2(A 



M2 



n 



Note that 



Eh? - q^'^f = E \ IkS^;^ - g(^')f /(A, > A*) U ii; <! ||gf - (z(^')f /(A, < A*) [> = Ai + As. 



For Xj > A*, (9.24) and (4.16) imply that uniformly over q £ H'^'~^''"{A') 



Ai < 2£; ||gi'^-g\i^r/(A,>A*) +2i? Il^y; 



O (n-ir2(A*)-i/(2j+i)) + O ((n-ir2)-5^^^^t+i) = O (^{n'^T^y^^^^^ . (9.25) 
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For (n r^)2j+i < Xj < A*, by direct calculus similar to that carried out above, one can show that 






Hence, 



qgj^m + r(^) 



sup A2 < sup jE\\q^J'^-q(J)\\^Jp{Xj<X*) 



geH"^+-^{A) 



O sup \/P{Xj < X*) 

\qeH"^+''(A) 



(9.26) 



Jj) Ji)ii2 



From the definition (|4.16|) of Xj, for A* > Xj there exists h < X* such that \\q\J — q^ \\ > 



^* '^ 



C72n-V2r2/i-(2i+i), where, by (4.17) and definition of dj, we have Cj = \\Kj\\{n + V2dj). It 



follows from (9.20) and (9.21) that for all h < X*, the variance term dominates over the squared 
bias, that is, 

sup \\E^^ - q^^^W^ < dy\\Kjfn-'T^h-^''^+'\ 

Hence, for ah h < X* and q £ H"'+'\A') 

P (llgf - qii\? > cT2c|n-ir2/i-(2.^-+i)') < P U^^ - Eq^\\'' > a^\\K,f{pi + d,fn-'T^h-^^^+'^ 

(9.27) 
due to Cj — dj > ||Kj|p(/i + dj)2. Thus, uniformly over q £ H'^^''"{A'), one has 



P(A,<A*) < Y.PCh = h)pl\\q^^}-qll^f>a'C]n-'T^h 





heAj 
h<X* 




heAj 
h<X* 


Note that 




11^- 


F«(-?')||2 

-^Qh II - 



(9.28) 



(11^ - ^^f > ^'WK.fif. + d,fn-'T^h-('^^'^^ 



^/,-0+i)K,(^)(t,-t,_i)6, 



h 



h~{2j+l)n-2j.2 ^Tq^ 



where Q is an n x n symmetric nonnegative-definite matrix with elements 



n 



Qii = -^{ti — ti-i){ti — ti 



-^) f^A 



z)Ki z + 



tj — u 
h 



dz. 



(9.29) 



Then, 
P f 11^ - E^\\^ > a^\\K,f{^, + d,)'n-ir>-(2^-+i) ) = P {e^Qe > n\\Kjf{fx + d,)^) . (9.30) 
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Applying a x^'type inequality which initially appeared in Laurent and Massart (1998), was 
improved by Comte (2001) and furthermore by Gendre (2010) in his Ph.D. Thesis, we derive 
that, for any x > 0, 



P ( a-^e'^Qe > ly^TriQ) + VxpLx(Q) 



where Tv[Q) is the trace of Q and p^s^xiQ) is the maximal eigenvalue of Q. Note that 

2 ^ 



(9.31) 



TV(Q) = -2 Y^iti - ti-if\\Kjf < nfi^Kjf. 



i=l 



and /Omax(Q) is ^^^ spectral norm of matrix Q which is dominated by any other norm. In particular. 



Pmax(<3) <max^|QH| 
1=1 



n 



-^max(ifc -tk-i] 

1^ k ,1 _i 



K,(z) 



E 

.1 = 1 



Ki[z + 



tk — tl 

h 



{tl - t/-i) 



dz. 



Since 



E 



1=1 

we have 



Ki[z + 



tk — tl 
h 



(ti-ti-i) 



KAz + 



tk — t 
h 



dt{l+o{l)) = h 



Kj[z + \-y 



dt(l+o(l)). 



n 



Prm^xiQ) < 7f^ max 

nh 

' T 



TJ k 



{tk — tk-i) h 



1 rl 



\Kj{z)\\Kj{z + tk/h - y)\dzdy 



1 J-i 



\Kj{z)\dz 



<M\Kj\ 



nh 



Using inequality (9.31) with x = djTn/{2fih) and h < X* one obtains 



P{\\ql!^-Eqh 



0-)||2 



> 



a^K,f{^^ + d,)^T^ 



n 



/i2j+i 



(d^T \ / 1 2m + 2j-l 

— ^ < exp ( -Cjn2™+2i+ir„""+'^+' 



(9.32) 
where Cj depends on m, A', /i and dj. Combination of (9.25), (9.26), (9.28) and (9.32) completes 
the proof. 
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Proof of Lemma [Tl Definitions (9.6) imply that k{x) = K[{x), K':_^{x) = Kj{x) and 
Kj{0) = 0, j = 1, ...,r. Observe that condition Kj{l) = 0, j = 1, ...,r, is equivalent to 



/ Kj{x)dx = 0, j = 0, ...,r-l, 
Jo 



(9.33) 



where Kq{x) = k{x). It is easy to see that (9.33) is valid for j = 0. For j > 1, note that, by formula 
(4.631) of Gradshtein and Ryzhik (1980), 



Kj{x) 



dzj^i 



Z,_l 



dz 



■i-2 • • • 



Zl 



k{z)dz 



1 



(j-1)! Jo 
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{x — zy k{z)dz. 



(9.34) 



Then, for any x G [0, 1], one has |X'j(x)| < [(j — 1)!] "*" ||fc||oo Jq{x — z^ ^dz < ||A;||oo- Moreover, 



by (9.34), for j = 1, ..., r — 1, one has 
"1 



Kj{x)dx 







(j - 1)! Jo 



1 fi _ 1 

k{z)dz j {x — zy dx 



U - l)!i! Jo 



(1 - zy k{z)dz = 0. 



Now, it remains to prove formula (9.7). Note that support of the function k{u/\n — {I — 1)) 
coincides with (z;_i,2;;), so that 

g(ti - x)k r \ dx = 

V '^n / Jmin(2;_i,t,) 



g{ti - x)k [ i 1 dx. 



(9.35) 



Formula (9.35) implies that I{i,l) = whenever z;_i > y^. If zi^i < yi < zi, it follows from (9.35) 

that 

'x-zi^i 



I{i,l)= r g{ti-x)kr- 



Xr, 



dx. 



Introduce new variable t = x — z/_i and denote uu = ti — zi-i. Then, recalling condition (A2) and 
using integration by parts, we derive 

/■"« f t \ ft 

I{i, I) = I g{uii - t)k [ ^- ] dt = \ng{uii - t)Ki 






n / g'{uii - t)Ki {^]dt 
Kg^'-^Hua - t)Kr (^) + a; £' g\ua - t)K,. [^ dt. 



Changing variables back to x, we arrive at 

'ti - Zl^i 



i{hi) = K 



Bj- Kj- 



Xr 



+ / g'^'\t,-x)Kr{'^^^^\dx 



zi-\ 



Xn 



(9.36) 



Finally, consider the case when zi < yi. Then, using relation zi = zi^i + A„, integration by 
parts and the fact that Kj(0) = Kj{l) = for j = 1, ...,r, we obtain 



Iii,l) 



g{ti - x)k \dx = Xn / g{ti - zi^i - Xnt)k{t)dt 

zi_i \ ■^n / Jo 

■ ■ = A;+i / g^'iti - zi_i - Xnt)Kr{t)dt = XI r g^^\ti - x)K, 

Jo Jzi.i 



r \ ^ I ^X 



which, in combination with (9.36), completes the proof. 
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